Sentiment Lexicon Generation for an Under-Resourced Language
نویسندگان
چکیده
Sentiment analysis and opinion mining are actively explored nowadays. One of the most important resources for the sentiment analysis task is sentiment lexicon. This paper presents our study in building domain-specific sentiment lexicon for Indonesian language. Our main contributions are (1) methods to expand sentiment lexicon using sentiment patterns and (2) a technique to classify the polarity of a word using the sentiment score. Our method is able to generate sentiment lexicon automatically by using a small seed of sentiment words, user reviews, and part-ofspeech (POS) tagger. We develop the lexicon for Indonesian language using a set of seed words translated from English sentiment lexicon and expand them using sentiment patterns found in the user reviews. Our results show that the proposed method can generate additional lexicon with sentiment accuracy of 77.7%.
منابع مشابه
A Supervised Method for Constructing Sentiment Lexicon in Persian Language
Due to the increasing growth of digital content on the internet and social media, sentiment analysis problem is one of the emerging fields. This problem deals with information extraction and knowledge discovery from textual data using natural language processing has attracted the attention of many researchers. Construction of sentiment lexicon as a valuable language resource is a one of the imp...
متن کاملSANA: A Large Scale Multi-Genre, Multi-Dialect Lexicon for Arabic Subjectivity and Sentiment Analysis
The computational treatment of subjectivity and sentiment in natural language is usually significantly improved by applying features exploiting lexical resources where entries are tagged with semantic orientation (e.g., positive, negative values). In spite of the fair amount of work on Arabic sentiment analysis over the past few years, e.g., (Abbasi et al., 2008; Abdul-Mageed et al., 2014; Abdu...
متن کاملBenchmarking Machine Translated Sentiment Analysis for Arabic Tweets
Traditional approaches to Sentiment Analysis (SA) rely on large annotated data sets or wide-coverage sentiment lexica, and as such often perform poorly on under-resourced languages. This paper presents empirical evidence of an efficient SA approach using freely available machine translation (MT) systems to translate Arabic tweets to English, which we then label for sentiment using a state-of-th...
متن کاملیک چارچوب نیمهنظارتی مبتنی بر لغتنامه وفقی خودساخت جهت تحلیل نظرات فارسی
With the appearance of Web 2.0 and 3.0, users’ contribution to WWW has created a huge amount of valuable expressed opinions. Considering the difficulty or impossibility of manually analyzing such big data, sentiment analysis, as a branch of natural language processing, has been highly considered. Despite the other (popular) languages, a limited number of research studies have been conducted in ...
متن کاملTowards weakly supervised acoustic subword unit discovery and lexicon development using hidden Markov models
State-of-the-art automatic speech recognition and text-to-speech systems are based on subword units, typically phonemes. This necessitates a lexicon that maps each word to a sequence of subword units. Development of a phonetic lexicon for a language requires linguistic knowledge as well as human effort, which may not be always readily available, particularly for under-resourced languages. In su...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Int. J. Comput. Linguistics Appl.
دوره 5 شماره
صفحات -
تاریخ انتشار 2014